38 research outputs found

    Knowledge Extraction from Textual Resources through Semantic Web Tools and Advanced Machine Learning Algorithms for Applications in Various Domains

    Get PDF
    Nowadays there is a tremendous amount of unstructured data, often represented by texts, which is created and stored in variety of forms in many domains such as patients' health records, social networks comments, scientific publications, and so on. This volume of data represents an invaluable source of knowledge, but unfortunately it is challenging its mining for machines. At the same time, novel tools as well as advanced methodologies have been introduced in several domains, improving the efficacy and the efficiency of data-based services. Following this trend, this thesis shows how to parse data from text with Semantic Web based tools, feed data into Machine Learning methodologies, and produce services or resources to facilitate the execution of some tasks. More precisely, the use of Semantic Web technologies powered by Machine Learning algorithms has been investigated in the Healthcare and E-Learning domains through not yet experimented methodologies. Furthermore, this thesis investigates the use of some state-of-the-art tools to move data from texts to graphs for representing the knowledge contained in scientific literature. Finally, the use of a Semantic Web ontology and novel heuristics to detect insights from biological data in form of graph are presented. The thesis contributes to the scientific literature in terms of results and resources. Most of the material presented in this thesis derives from research papers published in international journals or conference proceedings

    Understanding class representations: An intrinsic evaluation of zero-shot text classification

    Get PDF
    Frequently, Text Classification is limited by insufficient training data. This problem is addressed by Zero-Shot Classification through the inclusion of external class definitions and then exploiting the relations between classes seen during training and unseen classes (Zero-shot). However, it requires a class embedding space capable of accurately representing the semantic relatedness between classes. This work defines an intrinsic evaluation based on greater-than constraints to provide a better understanding of this relatedness. The results imply that textual embeddings are able to capture more semantics than Knowledge Graph embeddings, but combining both modalities yields the best performance

    Towards a representation of temporal data in archival records: Use cases and requirements

    Get PDF
    Archival records are essential sources of information for historians and digital humanists to understand history. For modern information systems they are often analysed and integrated into Knowledge Graphs for better access, interoperability and re-use. However, due to restrictions of the representation of RDF predicates temporal data within archival records is a challenge to model. This position paper explains requirements for modeling temporal data in archival records based on running research projects in which archival records are analysed and integrated in Knowledge Graphs for research and exploration

    TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study

    Get PDF
    Today, we are seeing an ever-increasing number of clinical notes that contain clinical results, images, and textual descriptions of patient's health state. All these data can be analyzed and employed to cater novel services that can help people and domain experts with their common healthcare tasks. However, many technologies such as Deep Learning and tools like Word Embeddings have started to be investigated only recently, and many challenges remain open when it comes to healthcare domain applications. To address these challenges, we propose the use of Deep Learning and Word Embeddings for identifying sixteen morbidity types within textual descriptions of clinical records. For this purpose, we have used a Deep Learning model based on Bidirectional Long-Short Term Memory (LSTM) layers which can exploit state-of-the-art vector representations of data such as Word Embeddings. We have employed pre-trained Word Embeddings namely GloVe and Word2Vec, and our own Word Embeddings trained on the target domain. Furthermore, we have compared the performances of the deep learning approaches against the traditional tf-idf using Support Vector Machine and Multilayer perceptron (our baselines). From the obtained results it seems that the latter outperforms the combination of Deep Learning approaches using any word embeddings. Our preliminary results indicate that there are specific features that make the dataset biased in favour of traditional machine learning approaches.Comment: 12 pages, 2 figures, 2 tables, SmartPhil 2020-First Workshop on Smart Personal Health Interfaces, Associated to ACM IUI 202

    Ontology modelling for materials science experiments

    Get PDF
    Materials are either enabler or bottleneck for the vast majority of technological innovations. The digitization of materials and processes is mandatory to create live production environments which represent physical entities and their aggregations and thus allow to represent, share, and understand materials changes. However, a common standard formalization for materials knowledge in the form of taxonomies, ontologies, or knowledge graphs has not been achieved yet. This paper sketches the efforts in modelling an ontology prototype to describe Materials Science experiments. It describes what is expected from the ontology by introducing a use case where a process chain driven by the ontology enables the curation and understanding of experiments

    TBVAC2020: Advancing tuberculosis vaccines from discovery to clinical development

    Get PDF
    TBVAC2020 is a research project supported by the Horizon 2020 program of the European Commission (EC). It aims at the discovery and development of novel tuberculosis (TB) vaccines from preclinical research projects to early clinical assessment. The project builds on previous collaborations from 1998 onwards funded through the EC framework programs FP5, FP6, and FP7. It has succeeded in attracting new partners from outstanding laboratories from all over the world, now totaling 40 institutions. Next to the development of novel vaccines, TB biomarker development is also considered an important asset to facilitate rational vaccine selection and development. In addition, TBVAC2020 offers portfolio management that provides selection criteria for entry, gating, and priority settings of novel vaccines at an early developmental stage. The TBVAC2020 consortium coordinated by TBVI facilitates collaboration and early data sharing between partners with the common aim of working toward the development of an effective TB vaccine. Close links with funders and other consortia with shared interests further contribute to this goal

    LexTex: a framework to generate lexicons using WordNet word senses in domain specific categories

    No full text
    Lexicons have risen as alternative resources to common supervised methods for classification or regression in different domains (e.g., Sentiment Analysis). These resources (especially lexical) lack of important domain context and it is not possible to tune/edit/improve them depending on new domains and data. With the exponential production of data and annotations witnessed today in several domains, leveraging lexical resources to improve existing lexicons becomes a must. In this work, a novel framework to build lexicons independently from the target domain and from input categories where each text needs to be classified is provided. It employs state-of-the-art Natural Language Processing, Word Sense Disambiguation tools, and techniques to make the method as general as possible. The framework takes as input a heterogeneous collection of annotated text towards a fixed number of categories. Its output is a list of WordNet word senses with weights for each category. We prove the effectiveness of the framework taking as case study the Emotion Detection task by employing the generated lexicons within such a domain. The results prove the effectiveness of proposed framework. Additionally, the paper shows an use case on the human-robot interaction within the Emotion Detection task. Furthermore we applied our methodology in several other domains and compared our approach against common supervised methods (regressors) showing the effectiveness of the generated lexicons. By freely providing the framework we aim at encouraging and disseminating the production of context-aware and domain-specific lexicons in other domains as well

    An advanced algorithm for fetal heart rate estimation from non-invasive low electrode density recordings

    No full text
    Non-invasive fetal electrocardiography is still an open research issue. The recent publication of an annotated dataset on Physionet providing four-channel non-invasive abdominal ECG traces promoted an international challenge on the topic. Starting from that dataset, an algorithm for the identification of the fetal QRS complexes from a reduced number of electrodes and without any a priori information about the electrode positioning has been developed, entering into the top ten best-performing open-source algorithms presented at the challenge. In this paper, an improved version of that algorithm is presented and evaluated exploiting the same challenge metrics. It is mainly based on the subtraction of the maternal QRS complexes in every lead, obtained by synchronized averaging of morphologically similar complexes, the filtering of the maternal P and T waves and the enhancement of the fetal QRS through independent component analysis (ICA) applied on the processed signals before a final fetal QRS detection stage. The RR time series of both the mother and the fetus are analyzed to enhance pseudoperiodicity with the aim of correcting wrong annotations. The algorithm has been designed and extensively evaluated on the open dataset A (N=75), and finally evaluated on datasets B (N=100) and C (N=272) to have the mean scores over data not used during the algorithm development. Compared to the results achieved by the previous version of the algorithm, the current version would mark the 5th and 4th position in the final ranking related to the events 1 and 2, reserved to the open-source challenge entries, taking into account both official and unofficial entrants. On dataset A, the algorithm achieves 0.982 median sensitivity and 0.976 median positive predictivity

    Deep Learning meets Knowledge Graphs for Scholarly Data Classification

    No full text
    The amount of scientific literature continuously grows, which poses an increasing challenge for researchers to manage, find and explore research results. Therefore, the classification of scientific work is widely applied to enable the retrieval, support the search of suitable reviewers during the reviewing process, and in general to organize the existing literature according to a given schema. The automation of this classification process not only simplifies the submission process for authors, but also ensures the coherent assignment of classes. However, especially fine-grained classes and new research fields do not provide sufficient training data to automatize the process. Additionally, given the large number of not mutual exclusive classes, it is often difficult and computationally expensive to train models able to deal with multi-class multi-label settings. To overcome these issues, this work presents a preliminary Deep Learning framework as a solution for multi-label text classification for scholarly papers about Computer Science. The proposed model addresses the issue of insufficient data by utilizing the semantics of classes, which is explicitly provided by latent representations of class labels. This study uses Knowledge Graphs as a source of these required external class definitions by identifying corresponding entities in DBpedia to improve the overall classification

    Modeling and extending ecological networks using land similarity

    No full text
    Complex network analysis is being applied on topological models of ecological networks, to extrapolate their advanced properties and as part of the activity of land management. Commonly employed methods tend to focus on single target species. This is satisfactory for cognitive analysis, but the limited view provided by these models results in a lack of general information needed for land planning. Similarity scores computed for pairs of nature protection areas are proposed as a building block of a general model to address this shortcoming
    corecore